06. Creating Metrics

Creating Metrics

Creating Metrics

Funnels

There are additional concepts and terms that are commonly used for designing
experiments, especially for web-based studies. In a web experiment, you'll
often think of the user funnel. A funnel is the flow of steps you expect a
user of your product to take. Typically, the funnel ends at the place where
your main evaluation metric is recorded, and includes a step where your
experimental manipulation can be performed. For example, we might think of the
following steps for someone to purchase a product in an online store:

  • Visit the site homepage
  • Search for a desired product or click on a product category
  • Click on a product image
  • Add the product to the cart
  • Check out and finalize purchase

One property to note about user funnels is that typically there will be some
dropoff in the users that move from step to step. This is much like how an
actual funnel narrows from a large opening to a small exit. Outside of an
experiment, funnels can be used to analyze user flows. Observations from these
flows can then be used to motivate experiments to try and improve the dropoff
rates.

It's also worth noting that the flow through a funnel might be idealized
compared to actual user practice. In the above example, users might perform
multiple searches in a single session, or want to purchase multiple things. A
user might access the site through a specific link, subverting the top part of
the funnel. Refining the funnel and being specific about the kinds of events
that are expected can help you create a consistent, reliable design and
analysis.

Unit of Diversion

Once you have a funnel, think about how you can implement your experimental
manipulation in the funnel. If the goal of the above experiment was to change
the way the site looks after a user clicks on a product image, we need to
figure out a way to assign users to either a control group or experimental
group. The place in which you make this assignment is known as the unit of diversion. Depending on the type of experiment you have, you might have
different options for diversion, each with its own pros and cons:

  • Event-based diversion (e.g. pageview): Each time a user loads up the page of
    interest, the experimental condition is randomly rolled. Since this ignores
    previous visits, this can create an inconsistent experience, if the condition
    causes a user-visible change.
  • Cookie-based diversion: A cookie is stored on the user's device, which
    determines their experimental condition as long as the cookie remains on the
    device. Cookies don't require a user to have an account or be logged in, but
    can be subverted through anonymous browsing or a user just clearing out cookies.
  • Account-based diversion (e.g. User ID): User IDs are randomly divided into
    conditions. Account-based diversions are reliable, but requires users to have
    accounts and be logged in. This means that our pool of data might be limited in
    scope, and you'll need to consider the risks of using personally-identifiable
    information.

When it comes to selecting a unit of diversion, the consistency of the
experience required can be a major factor to consider. For the example provided,
we need something more consistent than pageview events. So we then consider the
cookie-based diversion. If the differences in interface between control and
experiment are fairly minor, then we're probably okay with cookie-based
diversion. But if we think that users will notice the change and we believe
that it will have a major effect on experience, then we might be inclined to
choose an account-based diversion.

Invariant and Evaluation Metrics

A funnel will also be of benefit when it comes to deciding on metrics to track
and analyze as part of the experiment. The immediate features that come out of
a funnel come in the form of counts and ratios. For example, we could count the
number of times a search results in a product being selected (a count), or the
ratio of selections to searches as adjacent slices in the funnel (a ratio).

There are two major categories that we can consider features: as evaluation metrics or as invariant metrics. Evaluation metrics were mentioned in
the previous page as the metrics by which we compare groups. Ideally, we hope
to see a difference between groups that will tell us if our manipulation was
a success. We might want to see an increased click-through-rate from search
results to products, or an increase in overall revenue. On the flip side,
invariant metrics are metrics that we hope will not be different between
groups. Metrics in this category serve to check that the experiment is running
as expected. For example, in an experiment with cookie-based diversion, the
number of cookies generated for each condition would be a good invariant metric.
Another metric could compare the distribution of times in which cookies were
generated, to check the bias in the randomization procedure.

We're not limited to tracking just one metric of each type. It's not unusual
to track multiple invariant metrics as checks on the system's integrity, or
multiple evaluation metrics to check different potential facets of a
manipulation's effects. Don't think that you need to track every possible
metric, however. It's better to focus on a few key metrics, ignoring features
that might be less reliable or highly correlated to other, more informative
features. We'll discuss statistical considerations surrounding metrics in the
next lesson.